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^ Abstract 

■ This paper considers estimation of a univariate density from an individual 
' numerical sequence. It is assumed that (i) the limiting relative frequencies of 

the numerical sequence are governed by an unknown density, and (ii) there is a 
known upper bound for the variation of the density on an increasing sequence 
^ of intervals. A simple estimation scheme is proposed, and is shown to be Li 

. consistent when (i) and (ii) apply. In addition it is shown that there is no 

■ consistent estimation scheme for the set of individual sequences satisfying only 



O 



condition (i) 
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1 Introduction 



Estimation of a univariate density from a finite data set is an important problem in 
theoretical and applied statistics. In the most common setting, it is assumed that 
data are obtained from a stationary process Xi, X2, . . . such that 

JP{Xi eA}^[ fdx for every Borel set ^ C IR 

J A 

i.e. the common distribution of the Xi has density /, written Xi ~ /. For each n > 1 
an estimate /„ of /(■) is produced from Xi, . . . , The estimates {/„} are said to 
be strongly Li consistent if / — / |cia;^Oasn^oo with probability one. 

Common density estimation methods include histogram, kernel, nearest neighbor, 
orthogonal series, wavelet, spline, and likelihood based procedures. For an account of 
these methods, we refer the interested reader to the texts of Devroye and Gyorfi [4], 
Silverman [19], Scott [18], and Wand and Jones [20]. In estabhshing consistency and 
rates of convergence for estimation procedures like those above, many analyses assume 
that Xi,X2, ■ ■ ■ arc independent and identically distributed (i.i.d.), in which case the 
distribution of the process {Xj,} is completely specified by the marginal density / of 

Complementing work for independent random variables, numerous results have 
also been obtained for stationary sequences exhibiting both short and long range 
dependence. Roussas [17] and Rosenblatt [16] studied the consistency and asymptotic 
normality of kernel density estimates from Markov processes. Similar results, under 
weaker conditions, were obtained by Yakowitz [21]. Gyorfi [5] showed that there is 
a simple kernel-based procedure $ that is strongly L2-consistent for every stationary 
ergodic process {Xj}^_^ such that (i) the conditional distribution of Xi given {X^ : 
i < 0} is absolutely continuous with probability one, and (ii) the corresponding 
conditional density h satisfies E J \ h{u)\^du < 00. For additional work in this area, 
see also Ahmad [2], Castellana and Leadbetter [3], Gyorfi and Masry [7], Hall and 
Hart [9], and the references contained therein. 

With these positive results have come examples showing that density estimation 
from strongly dependent processes can be problematic. In a result attributed to 
Shields, it was shown by Gyorfi, Hardlc, Sarda and Vieu [8] that there are histogram 
density estimates, consistent for every i.i.d. process, that fail for some stationary 
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ergodic process. Gyorfi and Lugosi [6] established a similar result for ordinary kernel 

estimates. Extending these results, Adams and Nobel [1] have recently shown that 
there is no density estimation procedure that is consistent for every stationary ergodic 
process. 

With a view to considering density estimation in a more general setting, one may 
eliminate stochastic assumptions. Here we consider the estimation of an unknown 
density from an individual numerical sequence, which need not be the trajectory of 
a stationary stochastic process. We propose a simple estimation procedure that is 
applicable in a purely deterministic setting. This deterministic point of view is in 
line with recent work on individual sequences in information theory, statistics, and 
learning theory (cf. [22, 13, 12, 10]). Extending the techniques developed in this paper, 
Morvai, Kulkarni, and Nobel [14] consider the problem of regression estimation from 
individual sequences. 

In many cases, results based on deterministic analyses can be applied to individual 
sample paths in a stochastic setting. Theorem 1 of this paper yields a positive result 
concerning density estimation from ergodic processes (see Corollary 1 below). 

2 The Deterministic Setting 

Let / : IR — s> IR be a univariate density function with associated probability measure 
///(A) = Jj^f{x)dx. An infinite sequence x = {xi,X2,---) of numbers e IR has 
limiting density f if 

/i„(A) = lf^/{x,eA} ^ ^^f{A) (1) 

for every interval A C ]R. A sequence x having a limiting density will be called 
stationary. Let fl{f) be the set of stationary sequences with limiting density /. 

Note that stationarity concerns the limiting behavior of relative frequencies, which 
need not converge to their corresponding probabilities at any particular rate. Sta- 
tionarity says nothing about the mechanism by which the individual sequence x is 
produced. In particular, the limiting relative frequencies of a stationary sequence x 
are unchanged if one appends to x a prefix of any finite length. 

The sample paths of ergodic processes provide one source of stationary sequences. 
The next proposition follows easily from Birkhoff's ergodic theorem. 
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Proposition 1 If Xi, X2, . . . are stationary and ergodic with Xi ~ f, then X = 
(Xi, X2, . . .) G ^{f) with probability one. 

A univariate density estimation scheme is a countable collection $ of Borel- 
measurable mappings 0„ : IR x H" — * IR, n > 1. Thus 0„ associates every vector 
(xi, . . . , Xn) G IR" with a function : Xi, . . . , which is viewed as the estimate 
of an unknown density associated with the sequence Xi, . . . , These estimates may 
take negative values, and they need not integrate to one. In particular, no regularity 
conditions are imposed on the behavior of 0„ as a function of its inputs. 

A scheme $ is Li consistent for a a collection Q, of stationary sequences if for each 
X e Q, 

j \(t)n{,x : xi,. . . ,Xn) - fix)\dx 0, 

as n — > 00, where / is the limiting density of x. A scheme $ is universal if it is 
Li consistent for the set fl* of all stationary sequences. Note that, for i.i.d. data, 
a density estimation scheme is called universal if it is consistent for every marginal 
density /. The notion of universality defined above is considerable stronger, as there 
are no constraints apart from stationarity placed on the structure of the individual 
sequences. In what follows, when x — xi,X2, ■ ■ ■ is fixed, (j){x : Xi, . . . ,Xn) will be 
denoted by 0„(a;). 

Recall that the total variation of a real-valued function h defined on an interval 
[a,b) C IR is given by 

n 

V{h : a,b) = sup ^ \h{ti) - h{ti_i) \ , 

i=l 

where the supremum is taken over all finite ordered sequences a < tQ < ■ ■ ■ < tn < b. 
For each nondecreasing function a : (0, 00) let J-'{a) be the set of all densities 

/ on IR such that V{f : —i, i) < a{i) for i > 1, and let 

0(a) = u m 

be the collection of all those stationary sequences having limiting densities in T{a). 

Given a function «(•) as above, we propose a simple histogram based procedure 
that is consistent for Q,{a). For each A; > 1 let TTfe be the partition of IR into dyadic 
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intervals of the form 



with J e Z , 



and let 7rk[x] be the unique cell of tt^ containing x. Let {6„} be any sequence of 
positive integers tending to infinity. For each sequence of numbers xi, . . . ,Xn and 
each A; > 1 define histogram density estimates 

hn,k{x) = — X] ^{^i ^ TTfefx]} . (2) 

Our estimate is selected from among the histograms hn,k by selecting a suitable value 
of k. Find the partition index 

kn — max |l < A; < 6„ : V{hn,k '■ —h i) < 4^oi(i) for 1 < i < A;| (3) 
and define 

(f)l{x ■.Xi,...,Xn)^ K,kSx) ■ (4) 

If the conditions defining kn are not satisfied for any 1 <k <hn, then set 0* = 0. 

Theorem 1 Let a : Z.^ — > (0, oo) be a fixed, non- decreasing function. The estimation 
scheme = {0^} defined by (2)-(4) is Li-consistent for D,{a). Thus for every 
stationary sequence x with limiting density f G J-'{a), J {(pnix) — f{x)\dx — > 0. 

Corollary 1 Let «(•) be fixed and let 0* be defined by (2)-(4)- For every stationary 
ergodic process {Xj} such that ~ / with f e J'{oi), 

J \<f>l{x:X,,...,Xn)-f{x)\dx^O 

as n ^ oo with probability one. 

Example: Fix 7 > 0, and consider the class of stationary ergodic processes {Xi} 
such that Xi f with V{f : —00, 00) < 27. This class includes, but is not hm- 
ited to, processes having uniform, exponential, and normal marginal densities with 
arbitrary means, under the restriction that Var{Xi) is greater than (127^)^^, 7^^, 
and (27r7^)^^, respectively. By Corollary 1 there is a strongly consistent density 
estimation procedure $* for this class of processes. 
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Remark: The variations used to define 0* depend on the cumulative difference 
between the relative frequencies of adjacent cells: 

V{hn,k ■ = 2"^^' lAn(^fcj) - fln{Ak,j+l) \ ■ (5) 

To find 0*. put Xi, . . . ,Xn in increasing order, and then calculate V{hn,k '■ —hi) for 
each k — 1, . . . ,bn and each i = 1, . . . , A; by scanning the ordered Xi from left to right. 
This will require at most 0(nlogn + nbn) operations. 

In order to apply the procedure $* described in (2)- (4), one must know before 
seeing x that the variation of its limiting density is less than a known constant on 
every interval of the form The following result shows that this requirement 

cannot be materially weakened. 

Theorem 2 Let T he the collection of densities f supported on [0, 1] for which V{f : 
0, 1) is finite. There is no Li consistent density estimation scheme for 

In particular, there is no universal density estimation scheme for individual sequences. 

If an upper bound on the variance of the unknown density / were known, the 
scheme of Theorem 1 would provide consistent estimates of /. 

Given any density estimation scheme $ = {4>n}, the proof of Theorem 2 shows 
how one may construct a stationary sequence x, depending on $, for which 
fails to converge. A related argument is used by Adams and Nobel [1] to show that 
there is no universal density estimation scheme for stationary ergodic processes. As 
a universal density estimation scheme for individual sequences would, by virtue of 
Proposition 1, yield a universal scheme for ergodic processes, their result also implies 
Theorem 2. 

The proof of Theorem 1 is given in the next section after several preliminary 
results. The proof of Theorem 2 is given in Section 4. 
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3 Proof of Theorem 1 

Definition: For each partition tt of IR into finite intervals and eacfi f E Li define 

(/o7r)(x) = — ^ f f{u)du, 

where 1{A) denotes the length of an interval A. Note that / o tt is piecewise constant 
on the cells of tt. 

Lemma 1 Let 7ri,7r2, . . . be the partitions used to define the estimates 0*. For each 
pair of integers k,i > 1, 

V{fo7rk:-t,t) < 3V{f:-i,i). 

Moreover, if x E ^{f) then 

■ ~^'^) = V{fonk : 

Proof: For / non-decreasing it is immediate that V{f o tt^ : — < V{f : —i,i)- If 
V{f : —i, i) — C < oo then f{x) — u{x)—v{x) where u{-) and v{-) are non-decreasing, 
V{u : < C and V{v : < 2C (cf. Kolmogorov and Fomin [11]). It follows 

from the definition that / o tt^ — uoTTk — v oTTk, and since u and v are non-decreasing, 
so are uoTTk and v oTr^. Therefore 

V{f o TTfe : -i, i) = V{u OTTk -V OTTk : -i, i) 

< V{u o TTk : —i, i) + V{v o tt^ : —i, i) 

< V{u : -i,i) + V{v : -i,i) 

< 3C 

as the variation of the sum is less than the sum of the variations. To establish the 
second claim, note that as n — > oo 

12*= -2 

V{hn,k--i,i) = S"'' IMAkj) - p>„{Ak,j+i)\ 

^ 2-^ ^ \lif{Ak,j) - li^{Ak,j+^)\ 
= V{f OTTk-. . 

□ 
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Lemma 2 Let x e Q{a) with limiting density f G J-'{a). Then the partition index 

kn of the density estimate 0* tends to infinity with n. 

Proof: By Lemma 1, for arbitrary K > 1 and for all i = 1, . . . , 

\im^V(hn,K ■■ = V(fo7rK : < SV(f : -i,i) < 3a(i). 

Thus by definition of kn, lim inf „_>oo /c„ > K. □ 

Proof of Theorem 1: Let x G Q{a) be a fixed stationary sequence with limiting 
density / G J- {a). For each n > 1 such that kn> I define the error function 

9n{x) =(l)n{x:Xi,...,Xn)-f{x) = hn,kn{x) - f (x) , 

and note that for all 1 < i < 

V{gn : -i, i) < V{<l>: : -i, i) + V{f : -i, i) < 5a{i). (6) 
Fix e > 0. Select an integer L > 1 such that 

l^J(^)d.<. (7) 

and define 

^=1 (8) 
Finally, choose an integer K > 1 so large that 

9-^ < — (Q) 

a{L){50a{L) + 5Sy 

As X G ^{f) and the partitions tt^ are nested, there exists an integer = 
A"(x, e, /, a) such that for n > iV one has kn > maxjAT, L}, 

r 

I / gnix)dx\ = \ilniA)-fXf{A)\ < --2-^ (10) 
for A e ttk with A C [— L, L) , and 

>L}-//{|x| >L}| <e. (11) 

For each n let 

Hn^ixelR: \gn{x)\ > S} 



contain those points having large error, and let 

= e ttk : A n ^ 0, >1 C [-L, L)}. 

Fix n > N and consider a set A e Tin- By definition, there exists a point x E A 
such that |5'n(a;)| > 5. Assume for the moment that gn{x) > 5. It follows from (10) 
that there is a point y E A such that gn{y) < and therefore 

sup \gnix) - gn{y)\ > S/2. (12) 

As kn> L the variation of gn on A is less than 5a{L) by (6), so that for each z E A, 

gn{z) < gn{y) + 5a(L) < ^ + 5a{L) , 

and 

gn{z) > gn{x) - 5a{L) > 2 ~ 5q;(L) . 

Therefore, 

sup|^„(^)| < - + 5a(L). (13) 

A similar argument in the case gn{x) < —5 shows that both (12) and (13) hold for 
each A e Hn- It is immediate from (12) that 



2 

and consequently 



r 

-\nn\<V{gn:-L,L)<5a{L), 



< (14) 



For each n > N the integrated error between 0* and / may be decomposed as 
follows: 

- f{x)\dx 

< H / \9n{x)\dx + ^ \gn{x)\dx + / \gn{x)\dx 

A^n-, J A t^n, r r\JA J \x\> L 



AeHn^ A^Hn,A^[-L,L)' 

= 91 + 82 + 63 



Inequalities (13), (14) and (9) imply that 



and by virtue of (8), 

Ba < / 5dx = 6 -21 = 26. 



[-L,L) 

Finally, it follows from (7) and (11) that 

©3 < fin{\x\>L} + n{\x\>L}<?,e. 
Combining these three bounds shows that 

limsup / \(t)n{x) — f{x))\dx < 6e, 

n— >oo J 

and as e was arbitrary, the desired Li convergence of 0* to / follows. □. 

4 Proof of Theorem 2 

The following result can be established by a straightforward extension of the Ghvenko 
Cantelli Theorem, or by a bracketing argument (c.f. Pollard [15]). 

Lemma 3 Let A he the collection of all (finite and infinite) intervals m IR. If :x. & 
n(f) then 

sup IflniA) - IJ,f{A)\ 0. 

Proof of Theorem 2: Consider the family J^o = {hi, /i2, ■ ■ ■} Q T ol Rademacher 
densities where 



hk{x) 



2 if 2j2-^ <x< {2j + 1)2"^ for some < j < 2^"-^ 
otherwise . 



Note that each hj is supported on [0, 1] and that / \ hj{x) — hk{x)\dx = 1 whenever 
i ^ k. Let fik be the probability measure having density /i^, and for each finite 
sequence Mi, . . . , e [0, 1] let 



Afc(Mi, ...,Um) = sup 



m 



measure the distance between /i^ and the empirical measure of Wi, . . . , Um- 

We show that if $ is consistent for jFg then there is a stationary sequence x* whose 
limiting density is identically one on [0, 1], but is such that 0(- : x\, . . . ,a;*) fails to 

9 



have a limit in Li. For each k > 1 select a sequence x^^^^ = {xi'\x'2 \ ■ ■ ■) & ^{hk) 
(e.g. a typical sample sequence from an i.i.d. process with density hk), and define 

mfe = min I M : sup Akix? , x^^^) < I . 

m>M A; + 1 J 

Lemma 3 insures that exists and is finite. 

Fix any procedure $ = {0i, 02, ■ ■ ■} that is consistent for and consider the 
infinite sequence x^-*^). As hi E J^o, 



J \Mx:x^^\...,x^^'>)-hix)\dx^O 



as n — oo. Therefore there is an integer rii > m2 and a corresponding initial segment 
y(i) — x'^l\ . . ., x^^ of x<^^) such that 

J\4>nAx:y^'^)-hix)\dx<^ and Ai(yW)<^. 

Now suppose that one has constructed a sequence y^'^^ of finite length Uk from 
initial segments of x^^^ , . . . , x^'^) such that 



/ 



^n,{x:y^''^)-hk{x)\dx<l/A, (15) 

Ak{y^'^) < {k + (16) 

and 

nk>k- ruk+i . (17) 

As y'^'^-' is finite, the concatenation y*^'^) .x('^+^) is contained in VL{hk+i)- It follows from 
the consistency of $ and Lemma 3 that when n is large enough each initial segment 
y(fc+i) ^ y(fe) . (^xf^^\ xl^^^l) of y(^) • x'^+i satisfies (15) and (16) with k replaced 
by A; + 1. Select Uk+i > Uk so large that the same is true of (17). 

As y('=+^) is a proper extension of y^'^^ repeating the above process indefinitely 
yields an infinite sequence x*. By construction, the functions 0„(-) = (f){- : x*, . . . , x*) 
do not converge in Li. Indeed, it follows from (15) and the triangle inequahty that 
/ — (pni\dx > 1/2 whenever k ^ I. 

It remains to show that the hmiting density of x* is uniform on [0, 1]. To this end, 
fix A; > 1 and let A C [0, 1] be an interval of length 1{A). It is easily verified that 

\MA)-l{A)\<2-''+'<^. (18) 
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Let fin{A) be the empirical distribution of A under x^, . . . , , and for each 1 < r < 
Uk+i — Uk define 



1 "fc+r 
j=nk+l 



It follows from the equation 



that the difference 

^ / + //. 

By virtue of (16) and (18), 

/ < - ^J^k{A)\ + \l{A) - ^x^{A)\ < ^ + 

If Uk+i — Uk > r > ruk+i then 

A,+,«^^„...,<^^j = A,+,(xr^...,xM) < ^ 

and therefore 

// < \fi'{A) - /ife+i(A)| + W+i{A) - 1{A)\ < -1- + ^ 



k+2 k+1 

On the other hand, if 1 < r < nik+i then (17) implies that 

2r 2r 2 

// < < 



Uk + r ~ kr + r k + 1 
These bounds insure that 

4 

max{|/i„(A) - 1{A)\ : Uk < n < Uk+i} < ^, 

and consequently 

lim IfiniA) - 1{A)\ ^ . 

n— >oo 

As A e A was arbitrary, x* is stationary with limiting density f{x) = 1 on [0, 1]. □ 
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